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Abstract 

In this paper we consider the fundamental problem of approximating the diameter D of directed 
or undirected graphs. In a seminal paper, Aingworth, Chekuri, Indyk and Motwani [SIAM J. Comput. 
1999] presented an algorithm that computes in 0(m^/n + n 2 ) time an estimate D for the diameter of 
an n-node, rn-edge graph, such that [2/3-DJ < D < D. In this paper we present an algorithm that 
produces the same estimate in 0(m^Jn) expected running time. We then provide strong evidence that a 
better approximation may be hard to obtain if we insist on an 0(m 2 ~ £ ) running time. In particular, we 
show that if there is some constant e > so that there is an algorithm for undirected unweighted graphs 
that runs in 0(m 2 ~ e ) time and produces an approximation D such that (2/3 + e)D < D < D, then 
SAT for CNF formulas on n variables can be solved in O* ((2 — S) n ) time for some constant S > 0, and 
the strong exponential time hypothesis of [Impagliazzo, Paturi, Zane JCSS'01] is false. 

Motivated by this somewhat negative result, we study whether it is possible to obtain a better approx- 
imation for specific cases. For unweighted directed or undirected graphs, we show that if D = 3h + z, 
where h > and z € {0, 1, 2}, then it is possible to report in 0(min{m 2 / 3 rj 4 / 3 , m 2 ^ 1 ^ 2h+3 ' > }) time 
an estimate D such that 2h + z < D < D, thus giving a better than 3/2 approximation whenever 
z 0. This is significant for constant values of D which is exactly when the diameter approximation 
problem is hardest to solve. For the case of unweighted undirected graphs we present an 0(m 2 / 3 n 4 / 3 ) 
time algorithm that reports an estimate D such that [4£>/5j < D < D. 
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1 Introduction 



The diameter of a graph is the longest of all distances between vertices in the graph. The diameter is a 
natural and fundamental graph parameter, and computing it efficiently has many applications (e.g. |[3l). 
Essentially, the only known way to determine the diameter of a graph with arbitrary edge weights is to 
compute the distances between all pairs of vertices in the graph, that is, to solve the all-pairs shortest paths 
problem (APSP), and then to find the maximum distance. Because of this, some researchers have conjectured 
that APSP and diameter in weighted graphs may be equivalent in some sense (e.g. |2T1 and [El). The 
fastest algorithms for computing APSP and hence for computing the diameter for directed or undirected 
graphs on n nodes and m edges with arbitrary edge weights and no negative cycles have a running time of 
0(min{n 3 log log 3 nj log 2 n, mn + n 2 log log n}) B1 I16II . 

For the special case of dense directed or undirected unweighted graphs, one can compute the diameter 
by reducing its computation to fast matrix multiplication, thus obtaining 0(n w ) time algorithms, where 
u) < 2.38 is the matrix multiplication exponent J6) [191 HOI- In fact, any known algorithm for diameter in 
dense n-node unweighted graphs running in T(n) time can also be used to compute the Boolean product of 
two n x n Boolean matrices in 0{T(n)) time. This lead to conjectures 0CQ that computing the diameter 
in dense unweighted graphs and Boolean mauix multiplication (BMM) may be equivalent. 

For the special case of sparse directed or undirected unweighted graphs, the best known algorithm for 
both APSP and diameter does breadth-first search (BFS) from every node and hence runs in 0(mn) time. 
For sparse graphs with m = O(n), the running time is B(n 2 ) which is natural for APSP since the algorithm 
needs to output n 2 distances. However, for the diameter the output is a single integer, so it is not immediately 
clear why one should spend f2(n 2 ) time to compute it. In this paper, we show somewhat surprisingly, that 
breaking this seeming n 2 barrier would have major consequences for the complexity of NP-hard problems 
such as SAT. 

A natural question is whether one can get substantially faster algorithms for the diameter by settling 
for an approximation. A c-approximation algorithm for the diameter D of a graph for c > 1 provides an 
estimate D such that D/c < D < D. It is well known that a 2-approximation for the diameter in directed or 
undirected graphs with nonnegative weights is easy to achieve in 0(m) time using Dijkstra's algorithm from 
and to an arbitrary node. Dor, Halperin and Zwick [8] showed that any (2 — ^-approximation algorithm for 
APSP even in unweighted graphs running in T(n) time would imply an 0(T(n)) time for BMM, and hence 
apriori it could be that (2 — e) -approximating the diameter of a graph may also require solving BMM. 

In their seminal paper, Aingworth, Chekuri, Indyk and Motwani [ 1 ] showed that it is in fact possible 
to get a subcubic (2 — s) -approximation algorithm for the diameter of graphs with nonnegative weights 
without resorting to fast matrix multiplication. In particular, they designed an 0(m^/n + n 2 ) time algorithm 
computing an estimate D that satisfies [2-D/3J < D < D. Their algorithm has several important and 
interesting properties. It is the only known algorithm for approximating the diameter polynomially faster 
than 0(mn) for every m that is superlinear in n. It always runs in truly subcubic time even in dense graphs, 
and does not explicitly compute all-pairs approximate shortest paths. 

A natural question is whether there is an almost linear time approximation scheme for the diameter 
problem: an algorithm that for any constant e > runs in 0(m) time and returns an estimate D such that 
(1 — e)D < D < D. Such an algorithm would be of immense interest, and has not so far been explicitly 
ruled out, even conditionally. In this paper we give strong evidence that a fast (3/2 — e) -approximation 
algorithm for the diameter may be very hard to find, even for undirected unweighted graphs. We show: 

Theorem 1 Suppose there is a constant e > so that, there is a (3/2 — e) -approximation algorithm for the 
diameter in m-edge undirected unweighted graphs that runs in 0(m 2 ~ e ) time for every m. Then, SAT for 
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CNF formulas on n variables can be solved in 0*{{2 — 5) n ) time for some constant 5 > 0. 

The fastest known algorithm for CNF-SAT is the exhaustive search algorithm that runs in 0*(2 n ) time 
by trying all possible 2" assignments to the variables. It is a major open problem whether there is a faster 
algorithm. Several other NP-hard problems are known to be equivalent to CNF-SAT so that if one of these 
problems has a faster algorithm than exhaustive search, then all of them do [7]. Hence, our result also 
implies that if the diameter can be approximated fast enough, then also problems such as Hitting Set, Set 
Splitting, or NAE-SAT, all seemingly unrelated to the diameter, can be solved faster than exhaustive search. 
The strong exponential time hypothesis (SETH) of Impagliazzo, Paturi, and Zane ifTOlfTTI implies that there 
is no improved 0*((2 — 5) n ) time algorithm for CNF-SAT, and hence our result also implies that there is 
no (3/2 — ^-approximation algorithm for the diameter running in 0(m 2-£ ) time unless SETH fails. (We 
elaborate on this hypothesis later on in the paper.) 

We prove Theorem Q] by showing that an 0(n 2 ~ e ) time, (3/2 — e) -approximation algorithm for the 
diameter in sparse graphs with m = 0(n) would imply an 0*((2 — 5) n ) time CNF-SAT algorithm. This 
implies that unless SETH fails, 0(n 2 ) time is essentially required to get a (3/2— ^-approximation algorithm 
for the diameter in sparse graphs, within ra ^ 1 ) factors. Hence, within factors, the time for (3/2 — e)- 
approximating the diameter in a sparse graph is the same as the time required for computing APSP exactly ! 

Even more concretely, we prove Theorem Q] by showing that distinguishing whether the diameter of a 
given undirected unweighted graph is 2 or at least 3 fast enough would imply an improved SAT algorithm. 
(Any (3/2 — ^-approximation algorithm for the diameter would be able to distinguish between graphs 
of diameter 2 and 3.) The fastest algorithms for this special case of the diameter problem still run in 
0(min{mn, n^}) time, and several papers have asked whether one can do better (SOD. In 1987, Chung Q 
actually conjectured that this problem may be equivalent to BMM, so that any subcubic algorithm for it 
can be converted to a subcubic algorithm for BMM. Aingworth et al. (H conjectured that if there is a 
polynomially faster than 0(mn) time algorithm for this problem, then one can use it to construct a fast 
algorithm that computes the diameter exactly. These conjectures remain open, but Theorem Q] shows that 
the 2 vs 3 diameter problem may be hard to solve very efficiently for a different reason. 

Theorem Q] shows that unless SETH fails, the best one can do with an 0(m 2 ~ e ) time algorithm is a 3/2- 
approximation. The Aingworth et al. 3/2-approximation algorithm almost achieves an 0(m 2 ~ e ) runtime, 
except for very sparse graphs when it still runs in Q(n 2 ) time. We notice that with a slight change in 
the parameters of the algorithm, the Aingworth et al. running time can be modified to be 0(m 2 / 3 n) < 
(^(m 2 ^ 1 ^). We then investigate whether we can obtain a 3/2-approximation algorithm that improves upon 
these two runtimes of the Aingworth et al. algorithm. We give a new 3/2-approximation algorithm with 
0(m^Jn) expected running time, thus removing the n 2 additive factor from the original Aingworth et al. 
runtime with some randomization, and also beating 0(m 2//3 n). Our algorithm is the first improvement over 
the Aingworth et al. diameter algorithm. The improvement is especially noticeable for sparse graphs (with 
m = 0(n)) in which our algorithm runs in 0(n L5 ) time. Previously, such a result was known only for 
sparse planar graphs |2"lR We also show that in some special cases our algorithm obtains an approximation 
that is better than 3/2. 

Theorem 2 Let G = (V, E) be a directed or an undirected graph with diameter D = 3h + z, where 
h > and z £ {0, 1,2}. In 0{m^Jn) expected time one can compute an estimate D of D such that 
2h + z < D < Dfor z G {0, 1} and 2h + 1 < D < D for z = 2. 

For undirected or directed graphs with arbitrary nonnegative weights, we also obtain the following. 

1 disregarding polylogarithmic factors 
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Theorem 3 Let G = (V, E) be a directed or an undirected graph with nonnegative edge weights and 
diameter D. In 0{ra^Jri) expected time one can compute an estimate D of D such that \ 2D/2>\ < D < D. 

We further investigate whether one can improve the approximation for unweighted graphs obtained in 
Theorem|2]by possibly increasing the runtime, while still keeping it subcubic in n. Notice that in Theorem[2l 
the estimate D is at least 2h + z for z G {0, 1} and only at least 2h + 1 for z = 2. This only guarantees that 
D > [2D/3\ . (This is also the case for the algorithm of Aingworth et al. iPTl.) 

We show that with a slightly larger (but still subcubic) running time it is possible to get an estimate D of 
D such that 2h + z < D for any value z G {0, 1, 2}, thus guaranteeing that D > \2D/3] . This is significant 
when D is a constant, and also shows that when z ^ 0, the approximation factor is strictly better than 3/2: 
(3h + z)/{2h + z) = 3/2 - l/{4h/z + 2) < 3/2 - l/(4h + 2) < 3/2. 

We note that approximating the diameter is most challenging when the diameter is small. When the 
input graph has diameter D > n 6 for some e > 0, one can efficiently find an arbitrarily good approximation 
by random sampling: if you randomly sample Cn l ~ e /5 log n nodes, then with probability at least 1 — 1/ n c , 
one of these nodes is at distance at least (1 — 5)D from an endpoint of the diameter path; hence a 1/(1 — 5)- 
approximation can be found in (5(mn 1-e /<5) time by BFS. For sparse enough graphs of diameter 
however, the best known (3/2 — ^-approximation algorithms still compute the diameter exactly in 0(mn) 
time. Hence, it is quite interesting that we can obtain 0(m^/n) time (3/2 — e)-approximation algorithms 
for some constant values of the diameter. 

In Section [5] we prove the following Theorem. 

Theorem 4 Let G = (V, E) be a directed or undirected unweighted graph with diameter D = 3h + z, 
where h > and z G {0, 1, 2}. There is an 0(m 2//3 n 4 / 3 ) time algorithm that reports an estimate D such 
that 2h + z < t) < D. 

Marginally, we show how to get a better estimate for undirected graphs in the same running time. 

Theorem 5 Let G = (V,E) be an undirected unweighted graph with diameter D. There is an 
time algorithm that reports an estimate D such that \ AD /5j < D < D. 

The running time in Theorem [4] however is G(n 2 ) for sparse graphs. We hence investigate whether one 
can get an estimate \2D/3~\ < D < D in 0(m 2 ~ e ) time. We show: 

Theorem 6 There is an d{rn?~ l l( 2h+ ^) time deterministic algorithm that computes an estimate D with 
[2.D/3] < D < D for all m-edge unweighted graphs of diameter D = 3h + z with h > and z £ {0, 1, 2}. 
In particular, D > 2h + z. 

Notation. Let G = (V, E) denote a graph. It can be directed or undirected; this will be specified in each 
context. If the graph is weighted, then there is a function on the edges w : E — > Q + U {0}. Unless explicitly 
specified, the graphs we consider are unweighted. 

For any u, v G V, let d(u, v) denote the distance from u to v in G. Let BFS m (v) and BFS out (v) be the 
incoming and outgoing breadth-first search (BFS) trees of v, respectively, that is the BFS trees in G starting 
at v and in G with the edges reversed starting at v. Let d m (v) be the depth of BFS m (v), i.e. the largest 
distance from a vertex of BFS in (v) to v. Similarly, let d out (v) be the depth of BFS out (v). 

For h < d m (v), let BFS m (v,h) be the vertices in the first h levels of BFS m (v). Similarly, for 
h < d out (v), let BFS out (v, h) be the vertices in the first h levels of BFS 0Ut (v). 
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Let Nl n (v) (Ng Ut (v)) be the set of the s closest incoming (outgoing) vertices of v, where ties are broken 
by taking the vertex with the smaller id. We assume throughout the paper that for each v and each s < n, 
\Nl n (v)\ = \N° nt (v)\ = s, as otherwise the diameter of the graph would be oo, and this can be checked 
with two BFS runs from and to an arbitrary node. 

Let (£, n (v) be the largest distance from a vertex of N^(v) to v, and <i° ut (t>) be the largest distance from 
v to a vertex of N° ut (v). Let 4" = max^y df(v) and d° ut = max„ g y d° ut (v). 

For a set S C V and a vertex v G V we define ps(v) to be a vertex of S such that d(v,ps(v)) < d(v, w) 
for every w E S, i.e. the closest vertex of S to v. 

For a degree A we define pa{v) to be the closest vertex to v of degree at least A, that is, d(v,p&(v)) < 
d(v, w) for every w G V of degree at least A. 

We use the following standard notation for running times. For a function of n, f(n), 0(f(n)) denotes 
0(/(n)poly logn) and 0*{f{n)) denotes 0(/(n)poly n). 

2 Diameter approximation and the Strong Exponential Time Hypothesis 

Impagliazzo, Paturi, and Zane [ lOl [TQ introduced the Exponential Time Hypothesis (ETH) and its stronger 
valiant, the Strong Exponential Time Hypothesis (SETH). These two complexity hypotheses assume lower 
bounds on how fast satisfiability problems can be solved. They have frequently been used as a basis for 
conditional lower bounds for other concrete computational problems. 

Hypothesis 1 ([10, 11 j) ETH: There exists a real constant 5 > such that 3-SAT instances on n variables 
and m clauses cannot be solved in 2 <5n poly(m, n) time. 

A natural question is how fast can one solve r-SAT as r grows. Impagliazzo, Paturi, and Zane define 
s r = inf{<5 | 3 0*(2 Sn ) time algorithm solving r-SAT instances with n variables}, and Sqo = lim s r . 

r— >oo 

Clearly s r < s r+ i so that the sequence is nondecreasing. Impagliazzo, Paturi, and Zane show that if ETH 
holds, then s r also increases infinitely often. Furthermore, all known algorithms for r-SAT nowadays take 
time 0(2 n ( 1 - c / r )) for some constant c independent of n and r (e.g. @ El [B] El [HI [III). Because of this, 
it seems plausible that Sqo = 1, and this is exactly the strong exponential time hypothesis. 

Hypothesis 2 (OS [ED) SETH. Soo = 1. 

One immediate consequence of SETH is that CNF-S AT on n variables cannot be solved in 2 n ( 1 ~ £ )poly (n) 
time for any e > 0. The best known algorithm for CNF-S AT is the 0*(2 n ) time exhaustive search algorithm 
which tries all possible 2 n assignments to the variables, and it has been a major open problem to obtain an 
improvement. Cygan et al. Q showed that SETH is also equivalent to the assumption that several other NP- 
hard problems cannot be solved faster than by exhaustive search, and the best algorithms for these problems 
are the exhaustive search ones. 

Assuming SETH, one can prove tight conditional lower bounds on the complexity of some problems in 
P as well. The problem that we will look at is ^-dominating set for constant k: given an undirected graph 
G = (V,E), is there a set S of k vertices so that every vertex v G V is either in S or has an edge to 
some vertex in SI The best known algorithm for /^-dominating set for k > 7 runs in n k + W time and uses 
rectangular matrix multiplication lfl3l . Patra§cu and Williams [13 ] showed that improving on this runtime 
may be hard as it would imply faster algorithms for CNF-S AT. 
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Theorem 7 (|13|) Suppose there is ak > 3 and function f such that k-Dominating Set in an N-node graph 
is in 0(NfW) time. Then CNF-SAT on n variables and m clauses is in time. 

If f(k) = k — e for some constant e > 0, then the above implies that SETH is false. 
We show a strong relationship between the diameter problem in undirected unweighted graphs and k- 
dominating set. 

Theorem 8 Suppose one can distinguish between diameter 2 and 3 in an m-edge undirected unweighted 
graph in time 0{m 2 ~ £ ) for some constant e > 0. Then for all integers k > 2/e, 2k-dominating set can 
be solved in 0*(n 2k ~ £ ) time. Moreover, CNF-SAT on n variables and m clauses is in 
and SETH is false. 

Theorem [8] immediately implies Theorem Q] in the introduction, as any (3/2 — ^-approximation algo- 
rithm can distinguish between diameter 2 and 3. 

Proof. Given an instance G = (V, E) of 2fc-Dominating set for constant k, we construct an instance 
of the 2 vs 3 diameter problem and we show that 2/c-Dominating set in n-node graphs can be solved in 
0*(n 2k ~ s ) time for some constant 5 > depending on e. 

Take all /c-subsets of the vertices in V and add a node for each of them to the 2 vs 3 instance G' . Add a 
node for every vertex in V - call this set of nodes V and make V into a clique. 

For every /c-subset S of vertices of V, connect S to v G V in G' iff S does not dominate v in G. While 
we do this we check whether each 5 is a /c-dominating set in G, and if so, we stop. From now on we can 
assume that none of the /c-subsets S are dominating sets in G. 

Now, notice that if S and T are two /c-subsets so that their union is not a (< 2 /c) -dominating set in G, 
then the distance in G' between S and T is 2: there is some u that is dominated by neither 5 nor T and so 
5 — u — T is a path of length 2. If, on the other hand, S U T is a dominating set in G, then there is no such 
path and the shortest path between S and T in G' is to go from S to some v that S doesn't dominate, then to 
some u that T doesn't dominate (V is a clique) and then from u to T. 

The distance between any u and v in V' is 1, and the distance between any u and any S is at most 2: go 
from u to some node v that S doesn't dominate and then to S. 

Hence, if there is no 2/c-dominating set in G, then the diameter of G' is 2, and if there is one, then the 
diameter of G' is 3. G' has (£) + n nodes and at most 0(n ■ (")) < 0(n k+l ) edges. 

Since we can solve the diameter problem in 0(m?~ £ ) time, applying that algorithm to G' solves 2k- 
dominating set in G for any k > 2 in time 0{n 2k+2 ~ ek ~ e ). 

We want this to be 0(n 2k ~ s ) for some 5 > 0, so it suffices to pick k so that — 5 > 2 — e(k + 1). If we 
want 5 = e, then k > 2/e suffices. □ 

3 The algorithm of Aingworth et al. 

In this section we revisit the algorithm of Aingworth, Chekuri, Indyk and Motwani [1], that computes a 3/2- 
approximation of the diameter of a directed (or undirected) graph in 0{my/n + n 2 ) time. (The algorithm 
can also be made to work for graphs with nonnegative weights with roughly the same running time and 
approximation factor. In this section we only focus on the algorithm for unweighted graphs.) 

Let s be a given parameter in [1, n\. The algorithm works as follows. First, it computes N° ut (v) for every 
dGF. Then, for a vertex w, where d° ut (w) = d° ut it computes BFS 0Ut (w) and for every u G N° ut (w) it 
computes BFS m (u). Next, it computes a set S that hits N° nt (v) for every v G V and for every u G S it 
computes BFS ont (u). As an estimate, the algorithm returns the depth of the deepest computed BFS tree. 
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The next lemma appears in JT). We state it for completeness. 

Lemma 1 The running time of the algorithm is 0(ns 2 + (n/s + s)m). 

Aingworth et al. set s = \fn and obtain their running time. We note that if one sets s = m 1//3 instead, 
one can get a runtime of 0(m 2 / 3 n) that is better for sparse graphs; we later show that both of these runtimes 
can be improved with randomization. 

We now analyze the quality of the estimate returned by the algorithm. Aingworth et al. HI proved that 
this estimate is at least [2-D/3J in graphs of diameter D. Here we present a tighter analysis. 

Lemma 2 Let G = (V, E) be a directed graph with diameter D = 3h + z, where h > and z G {0, 1, 2}. 
Let D be the estimate returned by the algorithm. For z G {0, 1}, we have 2h + z < D < D. For z = 2, we 
have that 2h + 1 < D < D. 

Proof. Let a, b G V such that d(a, b) = D. First notice that the algorithm always returns a depth of some 
shortest paths tree and hence D < D. 

Now, if d° ut (w) < h then also d° ut (a) < h and as S hits N° nt (a), one of the BFS trees computed 
for vertices of S has depth at least 2h + z. Hence, assume that d° ut (w) > h. We can also assume that 
d out (w) < 2h + z as otherwise when we compute BFS 0Ut (w), we'd return a depth at least 2h + z. 

As d out (w) < 2h + z, also d(w,b) < 2h + z. Since d° ut (w) > h, we have that BFS out {w,h) C 
N° ut (w). Hence there is a vertex w' G N° ut (w) on the path from w to b such that d(w, w') = h and hence 
d(w', b) < h + z. Since d(a, b) = 3h + z, we must have that d(a, w') > 2h+l. As the algorithm computes 
BFS m (u) for every u G Nf ut (w), in particular, it computes BFS m (w'), and returns an estimate > 2h+ 1. 
For z G {0, 1}, d(a, w') > 2h + 1 > 2h + z and hence the final estimate returned is always at least 2h + z. 
For z = 2 we only have that d(a, w') > 2h + 1 and if the algorithm returns d(a, w') as an estimate, it may 
return 2h + 1 instead of 2/t + z. □ 



4 Improving the running time 

The algorithm of Aingworth et al. [1] runs in 0{ns 2 + (n/s + s)m). In this section we show that it 
is possible to get rid of the ns 2 term, while keeping the quality of the estimate unchanged. By choosing 
s = y/n, we get an algorithm running in 0(my/n) time. 

The term of ns 2 in the running time comes from the computation of N° ut (v) for every v G V. This 
computation is done to accomplish two tasks. One task is to obtain d° ut (v) for every v G V and then to use 
it to find a vertex w such that d° ut (w) = d° ut . A second task is to obtain, deterministicaily, a hitting set S of 
size 0(n/s) that hits the set N° ut (v) of every v G V. 

Our main idea is to accomplish these two tasks without explicitly computing N° ut (v) for every v G 
V. The major step in our approach is to completely modify the first task above by picking a different 
type of vertex to play the role of w. Making the second task above fast can be accomplished easily with 
randomization. We elaborate on this below. 

Our algorithm works as follows. First, it computes a hitting set by using randomization, that is, it picks 
a random sample S of the vertices of size ©(n/s log n). This guarantees that with high probability (at least 
1 — n~ c , for some constant c), S D N° ut (v) / 0, for every v G V. This accomplishes the second task 
above in 0(n) time, with high probability. Similarly to the algorithm of Aingworth et al. (H, our algorithm 
computes BFS 0Ut (v ), for every v G S. 
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We now explain the main idea of our algorithm, i.e. how we are able replace the first task from before 
with a much faster step. First, for every v G V our algorithm computes the closest node of S, ps{v), to 
v, by creating a new graph as follows. It adds an additional vertex r with edges (u,r), for every u G S. 
It computes BFS m (r) in this graph. It is easy to see that for every v G V the last vertex before r on the 
shortest path from v to r is ps(v). This step takes 0(m) time. 

Now, as opposed to the algorithm of Aingworth et al. that picks a vertex w such that d° ut (w) = d° ut , 
our algorithm finds a vertex w G V that is furthest away from S: i.e. such that d(w,ps(w)) > d(u,ps{u)), 
for every u G V. The vertex w plays the same role as its counterpart in [lj: Our algorithm computes 
BFS 0Ut (w) and obtains N° ut (w) from it. Finally, it computes BFS in (u) for every u G N° ut (w). As an 
estimate, the algorithm returns the depth of the deepest BFS tree that it has computed. 

In the next Lemma we analyze the running time of the algorithm. 

Lemma 3 The running time of the algorithm is 0((n/s + s)m). 

Proof. A hitting set S is formed in 0(n) time. With a single BFS computation, in 0(m) time, we find 
Ps{v) for every v G V, and hence also find w. The cost of computing a BFS tree for every v G SU N£ ut (w) 
is 0((n/s + s)m). □ 
Next, we show that the estimate produced by our algorithm is of the same quality as the estimate pro- 
duced by Aingworth et al. algorithm, with high probability. 

Lemma 4 Let G = (V, E) be a directed (or undirected) graph with diameter D = 3h + z, where h > 
and z G {0,1,2}. Let D be the estimate returned by the above algorithm. With high probability, 2h + z < 
D < D whenever z G {0, 1}, and 2h + 1 < D < D whenever z = 2. 

Proof. Let a,b G V such that d(a, b) = D. Let w be a vertex that satisfies d(w,ps(w)) > d(u,ps(u)), for 
every u G V. 

If d(w,ps(w)) < h then also d(a,ps{a)) < h. As the algorithm computes BFS out (v) for every v G S, 
it follows that BFS out (ps(a)) is computed as well and its depth is at least 2h + z as required. Hence, 
assume that d(w,ps(w)) > h. We can assume also that d ont (w) < 2h + z since the algorithm computes 
BFS 0Ut (w) and if d ont (w) > 2h + z then it computes a BFS tree of depth at least 2h + z as required. 

Since d out (w) < 2h + z it follows that d(w,b) < 2h + z. Moreover, since d(w,ps(w)) > h and 
S hits N° ut (w) whp, we must have that N° ut (w) contains a node at distance > h from w, and hence 
BFS 0Ut (w, h) C N° ut (w). This implies that there is a vertex w' G N° ut (w) on the path from w to b such 
that d(w, w') = h and hence d(w', b) < h + z. Since d(a, b) = 3h + z, we also have that d(a, w') > 2h + 1. 

The algorithm computes BFS m (u) for every u G N° ut (w), and in particular, it computes BFS m (w'), 
thus returning an estimate at least d(a, w') > 2h + 1. Hence for z G {0, 1} the final estimate is always 
> 2h + z, and for z = 2 the estimate could be 2h + 1 but no less. □ 

We now turn to prove Theorem [2] from the introduction. 

Reminder of Theorem [2] Let G = (V, E) be a directed or an undirected graph with diameter D = 3h + z, 
where h > and z G {0, 1, 2}. In 0(my/n) expected time one can compute an estimate D of D such that 
2h + z<D< Dfor z G {0, 1} and 2h + 1 < D < D for z = 2. 

Proof. From Lemma [3] we have that if we set s = y/n the algorithm runs in 0(m^/n) worst case time. 
From Lemma|4]we have that with high probability, that is 1 — n~ c for some constant c, the algorithm returns 
an estimate of the desired quality. We now show how to convert the algorithm into a Las-Vegas one so that 
it always returns an estimate of the desired quality but the running time is Oim^fn) in expectation. 
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Randomization is used only in order to obtain a set that hits N° ut (v) for every v E V. The only place 
that the hitting set affects the quality of the approximation is in Lemma@] where we used the fact that, whp, 
S contains a node of N° nt (w), so that if d(w, S) > h, N° ut (w) contains a node at distance > h from w. 

Note that we compute N° ut (w) and we can check whether S intersects it in O(s) time. If it doesn't, we 
can rerun the algorithm until we have verified that S n N° ut (w) 7^ 0. Since S n N° ut (w) = holds with 
very small probability, the expected running time of the algorithm is 0(my/n) and its estimate is guaranteed 
to have the required quality. □ 

Just as in HI, we can make our algorithm work for graphs with nonnegative weights as well by replacing 
every use of BFS with Dijkstra's algorithm. The proofs are analogous, and the running time is increased by 
at most a log n factor. We obtain 

Reminder of Theorem [3] Let G = (V, E) be a directed or an undirected graph with nonnegative edge 
weights and diameter D. In 0{m,y/n) expected time one can compute an estimate D of D such that 
L2D/3J <D<D. 

5 Improving the approximation for unweighted graphs 

In this section we show that in some cases it is possible to improve the approximation of the algorithm of 
Aingworth et al. for unweighted graphs. Recall that for a graph with diameter D = 3/i + 2 their algorithm 
returns an estimate D such that 2h + 1 < D < D. We show that for such a case it is possible to return an 
estimate D such that 2h + 2 < D < D. This is significant for small diameter values. For example, for a 
graph of diameter 5 our estimate is at least 4, while the previous estimate was at least 3. 

We present two algorithms that obtain this improved approximation, one works well for dense graphs 
and the other for sparse graphs. 

5.1 Dense graphs 

Our algorithm Approx-Diam(G) works as follows. (The pseudocode is in the appendix.) First, it runs the 
Aingworth et al. algorithm both on the input graph G and on the input graph with the edge directions re- 
versed, G R . Let D be the maximum value returned by these two runs. A byproduct of this step is that for 
every v £ V we have computed BFS 0Ut (v, d° ut (v) - 1) and BFS m (v,d™(v) - 1). Next, our algorithm 
scans all pairs of vertices u and v and checks whether the following condition holds: BFS 0Ut (u, d° ut (u) — 
1) and BFS in (v,df(v) - 1) are disjoint and there is no edge between BFS out (u, d° ut {u) - 1) and 
BFS m (v, rf s n (f ) — 1). Given a pair of vertices u and v for which the condition holds, the algorithm updates 
D to be the maximum between its current value and d° nt (u) + d™(v). 

We start by showing that the estimate reported by the algorithm is upper-bounded by the graph diameter. 

Lemma 5 Let G = (V, E) be a graph of diameter D. If D = Approx-Diam(G), then D < D. 

Proof. If Approx-Diam(G) returns the value that it gets from one of the runs of Aingworth et al. algorithm 
then the claim follows from Lemma [2] If the algorithm reports d° ut (u) + d^(v) for some pair of vertices 
u, v G V it is because there is no edge from BFS 0Ut (u, d° ut (u) — 1) to BFS m (v, cf s n (t; ) — 1), and no vertex 
in common between the two trees. This means that there is no path of length at most d° ut (u) + d l ^(v ) — 1 
from u to v, and hence, any path from u to v, and in particular the shortest one, is of length at least 
d° ut (u) + df(v) < D as required. □ 
Next, we lower-bound the estimate reported by the algorithm. 
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Lemma 6 Let G = (V, E) be a graph of diameter D = 3h + z, where h > 1 and z G {0, 1, 2}. If D = 
Approx-Diam(G) then 2h + z < D < 3h + z. 

Proof. Let a,b £ V such that d(a, b) = D. Running the algorithm of Aingworth et al. for G and the 
reverse G R of G implies that we get an approximation of 2h + z in the following cases. 
Case 1: [z / 2]. From Lemma|2j we have that the estimate is at least 2h + z. 

Case 2: [d° ut (a) < h or df(b) < h]. If d° ut (a) < /i then the hitting set computed by the Aingworth et 
al. algorithm contains a vertex at distance at most h from a and hence one of the BFS trees that it computes 
has depth at least 2h + z. Running the algorithm on G R guarantees that the same holds when d l ^(b) < h. 

Case 3: [3w G V s.t. d° ut (u>) > h + 2]. In this case let u; be the vertex with the largest d° ut (w) 
value. The Aingworth et al. algorithm computes BFS 0Ut (w). If d out (w) > 2h + 2 then the claim holds so 
assume that d out (w) < 2h + 1. The algorithm computes £FS in (u) for every u G BFS 0Ut (w, h + 1) and 
since 6) < 2/i + 1 there is a vertex G BFS 0Ut (w, h + 1) such that <i(u/, 6) < /i. As the algorithm 
computes BFS m (w') and d(a, w') > 2h + z the claim holds. 

For the rest of the proof we assume that the three cases above do not hold, hence, z = 2, d° ut (a) = h+ 1 
and tf s n (6) = h + 1. The second part of our algorithm searches for a pair of vertices u, v G V such that there 
is no edge from BFS out (u, d° ut (u) — 1) to BFS m (v, ^"(v) — 1) (and no vertex in common between the 
two trees). As D = d(a, b) = 3h + 2 > 2h + 1, and c2° ut (a) - 1 = h and cf s n 0) - 1 = h, we have that 
there is no edge from BFS 0Ut (a, d° u \a) - 1) to BFS in (b, d™(b) - 1) (and no vertex in common between 
the two trees). Since the estimate reported by the algorithm is the maximum among values that also include 
<i° ut (a) + df(b) = 2h + 2, we get that D > 2h + 2, as required. □ 

Reminder of Theorem [4] Let G = (V, E) be a directed or undirected unweighted graph with diameter 
D = 3h + z, where h > and z G {0, 1, 2}. There is an 0(m 2//3 n 4 / 3 ) time algorithm that reports an 
estimate D such that 2h + z < D < D. 

Proof. The bounds on the estimate follow from Lemma [6] and Lemma [5] Running the algorithm of 
Aingworth a/, takes 0{m(s + n/ s)+ns 2 ) time. Searching for a pair of vertices u, v G V such that there is 
no edge from BFS out (u, d° ut (u) - 1) to BFS in (v, df(v) - 1) takes O(nV) time. Setting s = (m/n) 1 / 3 
gives us the running time. □ 

We can use Theorem[4]to obtain an even better approximation for undirected graphs. 

Reminder of Theorem [5] Let G = (V, E) be an undirected unweighted graph with diameter D. There is 
an 0(m 2 / 3 7J 4//3 ) time algorithm that reports an estimate D such that [4D/5J < D < D. 

Proof. Using [ 8 ] we compute the distances between every pair of vertices in the graph, with an additive 
error of 2 in 0(min(n 3 / 2 y / m, ra 7 / 3 )) time. If D is the maximum distance minus 2 then D — 2 < D < D. 
For every D > 6 we have that D — 2 > [4/ 5D\ . Thus, when D > 4 we get an estimate of at least [4D /5J . 
If D = 3 then D might be either 3, 4 or 5, that is, D = 3 + z, where z G {0, 1, 2}. If D = 5, an estimate of 
3 is not good enough, thus we run Approx-Diam(G). Let D' be the estimate reported by Approx-Diam(G). 
From Lemma [6] it follows that if D = 5 then D' > 4 and we have the required approximation. If D = 2 
then D might be either 2, 3 or 4, and for this case we can just use the Aingworth et al. algorithm to get an 
estimate of 3 whenever D = 4 which gives the desired approximation. □ 

5.2 Sparse graphs 

We now show that it is possible to obtain the better approximation also in 0(m 2 ~ £ ) time for constant e > 
when the diameter of the given graph is constant. 

Our algorithm, Approx-Diam-Sparse(G, h) is given an estimate h of h so that h > h and works as 
follows. (The pseudocode can be found in the appendix.) Let A be a parameter and let H be the set of 
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vertices of outdegree at least A. For every vertex of H, the algorithm computes an outgoing BFS tree. 
Then, it computes the distance from every node in V \ H to H. This is done by adding an extra node r to the 
graph with edges from each node of H to r and then computing an incoming BFS to r in 0(m) time. The 
distance of a node v to H is its distance to r, minus 1. The algorithm then picks the vertex w that is furthest 
from H and computes BFS out (w). Let h! = min{/i + l,d(w,H)}. The algorithm computes BFS m (v) 
for every v G BFS 0Ut (w, h'). Finally, it returns the maximum depth of all computed BFS trees. 
We now analyze the quality of the approximation. 

Lemma 7 Let G = (V, E) be a graph of constant diameter D = 3h + z, where h > and z G {0, 1, 2}. If 
D = Approx-Diam-Sparse(G, h )for h > h, then 2h + z < D < D. 

Proof. First notice, that in any case the algorithm returns a depth of some BFS tree in the graph, thus 
D < D. 

Now, let a, b G V such that d(a, b) = D and let H C V be the set of vertices of outdegree at least A. Let 
y° G H be the vertex with the deepest outgoing BFS in H. Let y l be the vertex with the deepest incoming 
BFS among the vertices of BFS out (w, hi), where h! = min{/i + 1, d(w, H)}. The algorithm returns as an 
estimate max(d out (y°), d out (w), d m {y i )). 

If d(a, H) < h, then d out (y°) is at least 2h + z and the estimate is of the desired quality. So assume 
that d(a, H) > h, and hence d(w, H) > d(a, H) > h + 1. Thus h! > h + 1, as we also have h > h by 
assumption. Assume also that BFS out (w) is of depth at most 2h + z — 1 as if it is of depth at least 2h + z 
then the estimate is of the desired quality. Then, there is a vertex w' G BFS 0Ut (w, h!) on the shortest path 
from w to b with d(w, w') = h + 1 and hence d(w', b) < h + z — 2. As d(a, b) = 3h + z, we must also 
have d(a, w') > 2h + 2 and as d m {y % ) > d(a, w'), the estimate is of the desired quality. □ 

Next, we analyze the running time of the algorithm. 

Lemma 8 Let G = (V, E) be a graph of diameter D = 3h + z, where h > and z G {0, 1, 2}. Ifh> h, 
Approx-Diam-Sparse(G, h) runs in 0(m 2 /A + A h+1 m) time. 

Proof. The algorithm computes a BFS tree for every vertex of H. \H\ = 0(m/A) since there are at most 
that many vertices of outdegree at least A. Hence the BFS computation from H takes 0(m 2 /A) time. 

Computing the distances of the nodes in V \ H to H takes only 0(m) time. Picking the node w at 
largest distance to H takes 0(n) time. The algorithm computes BFS out (w) in 0{m) time. It then computes 
BFS m (v) for every v G BFS out (w, h!) where h! < h + 1. Since we also have that h' < d(w, H), every 
v G BFS out (w, h' - 1) has outdegree at most A. Thus, \BFS out {w, h')\ < A h ' < A h+1 . The running 
time of computing BFS m (v) for every v G BFS 0Ut (w, h!) is hence at most 0(mA h+1 ). □ 

We now prove Theorem [6] from the introduction. 

Reminder of Theorem [6] There is an 0(m 2_1 ^ 2/l+3 ^) time deterministic algorithm that computes an 
estimate D with \2D /3\ < D < D for all m-edge unweighted graphs of diameter D = 3h + z with h > 
and z G {0, 1, 2}. In particular, D > 2h + z. 

Proof. In 0(m) time we can get a 2-approximation to the diameter, i.e. an estimate E with D/2 < 
E < D. Since D = 3h + z, we have that (E - 2)/3 < h < 2E/3. Setting h = 2E/3 guarantees that 
h < h < 2h + 4/3 < 2h + 2, and hence h < h < 2h + 1. 

The quality of the estimate follows from Lemma[7]and by Lemma[U the runtime is 0(m 2 / A+mA 2h+2 ). 
Picking A = rn 1 '( 2h+3 ' minimizes the running time at 0(m 2_1 /( 2ft+3 )). □ 
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6 Appendix 



Algorithm 1: Approx-Diam(G) 

X 1 4- Aingworth(G); 
X 2 <- Aingworth(G fl ); 
D «— m.ax(X 1 ,X 2 ); 
foreach v G V do 

foreach u G V \ {v} do 

if BFS 0Ut (u, d° s ut {u) - 1) n BFS in (v, df{v) — 1) = A $(u', v') G E s.t. 
v! G BFS out (u, d° ut {u) - 1) A v' G BFS in (v, df(v) - 1) then 
L <- max(A d° ut (u) + 4 n (^)) 

return D; 



Algorithm 2: Approx-Diam-Sparse(G, /i) 

H <r- {v | de^u) > A}; 

foreach u G H do Compute BFS out {v); 

y° <- argmax^g// d out (x); 

£) <_ d out (y ); 

Compute d(v, H) for all v G V with a single BFS; 
iw «— vertex of largest 
Compute BFS ou \w); 
D <- max{D,d 0Ut (y°)}; 
h' <- mm{h + 1, d(w, H)}; 

foreach v G BFS ou \w, ti) do Compute BFS in (v); 
y i <- argmax^g^gout^^,) d in (x); 

£> <- max{ J D,d"V)}; 
return l); 
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